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(57) Abstract 

The invention relates to area efficient realiza- 
tion of coefficient block [A] or architecture [A] with 
hardware sharing techniques and optimizations applied 
to this block. The block [A] is connected to co- 
efficient lines CLin_0, CLin_l CLin_n and BLinJD, 

BLin_l BLin_n coming from block [E] and/or [F], to 

be connected to perform filtering operation or a mathe- 
matical computing operation with optimization in hard- 
ware and provides a zero latency output. The invention 
also gives the area minimal realization of digital filters 
based on coefficient block [A], when operated in bit se- 
rial fashion. The optimization techniques and structure 
of the present invention are good for bit-serial digital 
filters typically a finite impulse response (FIR) filter, in- 
finite impulse response filter (IIR) and for other filters 
and applications based on combinational logic consisting 
of delay element (T), multiplier (M), serial adder (SA) 
and serial subrractor (SS). 



Minimizations in "proposal for PATENT'* 

bt* Btn Bm am Brro 

..I I L I I.. 




8iock(A). 



c 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 


Spain 


LS 


Lesotho 


SI 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuania 


SK 


Slovakia 


AT 


Austria 


FR 


France 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


AZ 


Azerbaijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Chad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TC 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


TJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Faso 


CR 


Greece 




Republic of Macedonia 


TR 


Turkey 


BC 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UC 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Tiaty 


MX 


Mexico 


uz 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Vict Nam 


CC 


Congo 


KE 


Kenya 


NL 


Netherlands 


YU 


Yugoslavia 


CH 


Switzerland 


KG 


Kyrgyzstan 


NO 


Norway 


zw 


Zimbabwe 


CI 


Cote d*I voire 


KP 


Democratic People's 


NZ 


New Zealand 






CM 


Cameroon 




Republic of KoTca 


PL 


Poland 






CN 


China 


KR 


Republic of Korea 


PT 


Portugal 






cu 


Cuba 


KZ 


Kaxakstan 


RO 


Romania 






cz 


Czech Republic 


LC 


Saint Lucia 


RU 


Russian Federation 






DE 


Germany 


LI 


Liechtenstein 


SD 


Sudan 






DK 


Denmark 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Estonia 


LR 


Liberia 


SG 


Singapore 







WO 00/22729 



JC 




AREA EFFICIENT REALIZATION OF COEFFICIENT ARCHITECTURE FOR BIT-SERIAL FIR, IIR FILTERS AND 
COMBINATIONAL/SEQUENTIAL LOGIC STRUCTURE WITH ZERO LATENCY CLOCK OUTPUT 

FIELD OF INVENTION 

The invention relates to area efficient realization of coefficient block [A] or 
achitecture [A] with hardware sharing techniques and optimizations applied to this 

block. The block [A] is connected to coefficient lines CLin_0, CLin_l CLin_n 

and BLin_0, BLin_l,...JBLin_n coming from block [E] and/or [F], to be connected 
to perform filtering operation or a mathematical computing operation with 
optimization in hardware and provides a zero latency output. The invention also 
gives the area minimal, realization of digital filters based on coefficient block[A], 
when operated in bit serial fashion. The optimization techniques and structure of 
the present invention are good for bit-serial digital filters typically a finite impulse 
response(hlK) filter, infinite impulse response filterQIR) and for other filters and 
applications based on combinational logic consisting of delay element(T), 
multiplier(M) ? serial adder(SA) and serial subtractor (SS). . 

Brief description of the accompanying drawings 
In the accompanying drawings; 

Figure 1 shows the field of invention, applications of the device 
Figure 2 shows the symbol of components used in the device. 
Figure 3 shows the description of components used in the device. 
Figure 4 shows the bit serial FIR filter implementations 
Figure 5 shows an example of FIR filter. 

Figure 6 shows- one of the known minimization technique due to symmetry of 
coefficient. 

Figure 7 shows the structure of prior/known implementation technique for 
coefficient block. 

Figure 8 shows the generalized structure of prior/known implementation technique 
of coefficient block. 
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Figure 9 shows the minimization technique involved in FIR filter. 

Figure 10 shows the generalized structure of the minimization technique involved 

in FIR filter. 

Figure 1 1 shows the minimized structure of this example FIR filter, of the present 
invention. 

Figure 12 shows the generalized opimized structure of the present invention. 
Figure 13 shows the other advantage of the structure i.e getting the parallel output 
directly, of the present invention. 

Details of Elements/symbol used in the description 

The basic components symbol used in design are shown in "Figure 2" of the 
drawings. In addition, explanation and usages of the device are done in the text 
below and depicted in "Figure 3" and "Figure 4" of the drawings. 
Unit delay (T) 

It is one bit delay element. It also performs function of a multiplier by a factor of 2. 
[e.g. For the serial input frame (0101011 in binary or 43 in integer 
representation), the output of this block is (01010110 in binary or 86 in integer 
representation). This element is usually a Flip-flop (D Flip-flop, J-K Flip-flop etc.). 
Full adder (FA) 

It performs binary addition. The inputs to this element are A, B, Cin (Carry in) 
while the outputs are Z and Cout (Carryout). The truth table for full adder 
functionality is shown in "Figure 3" of the drawings. 
Full subtractor (FS) 

It performs binary subtraction. The inputs to this element are A, B, Cin (Carryin) 
while the outputs are Z and Cout (Carryout). The truth table for full subtractor 
functionality is shown in "Figure 3" of the drawings. 
Serial adder (SA) and Serial Subtractor (SS) 
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It performs addition/subtraction of two serial frame, xl(nT), x2(nT) to generate 
output y(nT) represented as xl(nT)+x2(nT) or xl(nT)-x2(nT) . The serial adder (or 
subtractor) is implemented using a full adder (or subtractor) with a Flip-Flop as 
shown in "Figure 3" of the drawings. The output Cout of [FA/FS] is delayed using 
the [T] element and is applied to Cin line of [FA/ FS]. This enables the [FA7FS] 
_and JT] together to function as serial, adder (SA/SS), where A, B are the inputs to 
this element and Z is the output, (e.g of serial addition is as follows, if xl(nT) = 
0110 (6 in integer) and x2(nT) = 0111 (7 in integer). Then y(nT) = 01101 (13 in 
integer representation). 
Serial Multiplier (M) 

It multiplies two serial input frame X(nT) and m. The output is function 
represented as Y(nT) = X(nT) * m. A serial coefficient multiplier(M) can be 
implemented by shift register using [T] elements and adder element [SA] (One 
shift means multiply by factor of 2). As shown in "Figure 3" of the drawings, the 
multiplier is formed by adding the outputs corresponding to ones in the binary 
representation of the coefficient. 
Delay (Z- 1 ) 

Delay by one frame of data is done by shift register (series of Flip-flops (T) 
connected to store and shift the input frame). The number of Unit delay (T) in one 
delay element is equal to the frame size of the input. 

PRIOR ART OR EXISTING IMPLEMENTATION OF FILTER 
The following description discusses the elements used for implementation of 
architecture and the existing implementations for digital filters. The proposed 
minimization is extendable to other applications such as Digital Signal Processing 
field and Digital designs. 
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From here onwards, all the illustration would be done with FIR filter which is 
extendable to other filters as described earlier. "Figure 4"shows the existing 

structure of bit serial FIR filter with coefficient lines CLin_0, CLin_l, CLin_n 

and the coefficient block [A] having the coefficients c(0), c(l), c(2),....c(n). The 
coefficient block is connected to delay element [Z~ l ] and serial adders [SA] to form 
Jilter structure. 

Stating the FIR filter equation in time and frequency domain 
Y(n) = c(0) X(n) + c(l) X(n-l) + c(2) X(n-2) + c(n) X(0) 

Y(z) = X(z) [c(0) + c(l) r 1 + c(2) T 2 + c(3) Z° + c(4) 7? + c(5) Z~ 5 + c(6) 
Z* 6 * +c(n) Z- Q ] 

where X, Y are the input and output respectively and c(0), c(l) c(n) represent 

the coefficients value which defines the characteristics of the filter and each delay 
[Z" 1 ] block represent sample delay of one. The filter equation can be implemented 
in two ways as shown in "Figure 4" of the drawings 

In implementation 1, coefficient lines CLin_0, CLin_l, CLin_n are common 

and connected to input X[n]. The output lines CLout_0, CLout_l, CLoutji are 

connected to block [E], consisting of delay element [Z* 1 ] and serial adders [SA] 
elements. The structure makes easy realization of share-able multiplier in the 
coefficient block [A]. An example of share-able multiplier with coefficient values 
3,11 is illustrated in "Figure 4". The realization of these coefficient separately 
would require 4[T], 3[SA] elements. By virtue of CLinJ), CLin_l,... being 
common, the hardware is realized using 3[T], 2[SA] elements. Another feature of 
the structure is that the structure inherently requires more storage area, represented 
by £Z -1 ], as compared to implementation!, since the storage is done after the 
multiplication. For input frame of n bit and coefficient of size m bit, the storage 
area of each delay element [Z* 1 ] is (m+n). The total storage space of the delay 
elements is (m+n) * (number of coefficients -1). 
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In implementation 2, the coefficient line CLin_0, CLin_l, are not common. By 

virtue of connectivity of different input lines to all the coefficient elements [c(0), 

c 0) ]» ^ realization of coefficients block [A] using share-able elements is not 

present. Another feature of this structure is that it inherently requires lesser storage 
space, represented as [Z" 1 ], unlike in previous implementation, here the storage is 
_done before multiplication. For input frame of m bit and coefficient of size n bit, 
the storage area of each delay element [Z" 1 ] is (m). The total storage space is (m) * 
(number of coefficients -1). 

The invention is proposed in reducing the area of the coefficient block [A] and 
have share-able elements in coefficients, even if the coefficient lines CLin_0, 

CLin_l, are not commonly connected. For existing configuration as shown in 

"Figure 7" and "Figure 8" , the share-ability of hardware in block [A] is a 
limitation. 

Also, as described in previous -section, implementation 2 is area efficient with 
respect to implementation 1 due to reduced delay elements size. Over and above 
this by having share-able multiplier or reduced coefficient block [A], which are the 
key features of the invention, implementation 2 becomes still more area-efficient. 
This reduction is extendable to other filter based on coefficient block [A], as stated 
m the first section. The present invention operates on integer valued coefficient. 

Further, to quote Norsworthy and Crochiere (Delta-Sigma Data Converters IEEE 
press pp-435, copyright 1997) 

"Bit-serial architecture reduce the interprocessor communication down to 1 bit. 
Generally the number of processors is very large, but because each processor is so 
small, the overall economy is very high. Bit serial architectures are usually most 
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effective for filters having a few state variables, such as ICR. filters and the wave- 
digital filters. For this reason, bit- 4 serial techniques are less frequently applied to 
FIR structures, especially when the filter length is relatively long " 

However, the present invention applies optimization techniques for reducing the 
_area.in large sized coefficients by applying a number of optimizations in FIR/OR 
filter structures. 

To elaborate the applicant's optimization techniques, consider a FIR filter with 
coefficient as 5, 14, 25, 30, 25, 14, and 5. Though the size of the coefficients in 
this example is small, it is enough to elaborate the minimization proposals. In most 
of the practical cases, the coefficients are symmetrical. 
Stating the FIR filter equation in time and frequency domain 
Y(n) = c(0) X(n) + c(l) X(n-1) + c(2) X(n-2) + c(n) X(0) 

Y(z) - X(z) [c(0) + c(l) T x + c(2) 2T 2 + c(3) 2T 3 + c(4) Z 4 + c(5) T 5 + c(6) 
Z**+ +c(n) Z-n] 

where X, Y are the input & output respectively and c(0), c(l) c(n) represent the 

coefficients value. 

Using the coefficient values in above equation 

Y(n) = 5 X(n) + 14 X(h-l) + 25 X(n-2) + 30 X(n-3) + 25 X(n-4) +14 X(n-5) + 5 
X(n-6) 

Y(z) = X(z) [5 + 14 Z l + 25 Z 2 + 30 Z' 3 + 25 Z* 4 + 14 Z* 5 + 5 Z* 6 ] (EQ 1) 

The Existing Method and Minimization 

"Figure 5" of the drawings shows FIR filter structure of implementation 2. The 
figure illustrates the realization of FIR filter represented by "Equation 1" . 
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In one of the known optimization technique, is taken advantage of the symmetry in 
the coefficients. The streams which have to be multiplied with the same 
coefficients can be added first and then multiplied. For a large filter structure, this 
leads to a reduction by 45% in the coefficient block, (see "Figure 6" of the 
accompanying drawings) 

This is done by restructuring the equation as under: 

Y(z) = X(z)[5*(l+Z- 6 ) + 14*(2T l +Z- 5 ) + 25*(r 2 +Z^) + 30 * Z* 3 ] (EQ 2) 

For the rest of the optimization proposals it will be talking about only the 
multiplier adder series which is shown in the dotted box referred to as coefficient 
block [A]. "Figure 7" of the drawings shows the traditional way of implementation 
of the example structure for block [A], wherein SI to S4 represent the lines 
connected to delay block [ZT 1 ] through line CLin_0 to CLin_6 depicted in 
"Figure 6" of the drawings. The Lines SI to S4 are separately connected to [T] 
element for performing a multiplication by a factor of 2 and (SA) is being used to 
perform serial addition of data. This represents the multiplier less realization of 
filter coefficient block (A) where the property of flip-flop (T) as multiplier of 
factor of two is used. 

Mathematically, the restructured equation according to the structure is stated as 
Y(nT)=(4+l)Sl + (8+4+2)S2 + (16+8+l)S3 + (16+8+4+2)S4 (EQ 3) 

In this implementation, SI, S2, S3, S4 lines are not commonly connected. Hence 
this-restricts to achieve a share-able hardware in coefficient block [A]. Thus all the 
function/operations of this block represent unique hardware. The elements required 
by the terms are listed as 
First term = 2[T], 1[SA] 
Second term = 3[T], 2[SA] 
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Third term = 4[T], 2[SA] 
Fourth term = 4[T], 3[SA] 

Final addition of all the four term would require 3[SA]. 

The generalized structure of "The Existing Method and Minimization" is depicted 
_jn "Figure 8". In the structure, each column represents a coefficient value. The [T] 
elements, shown as Tl_l to Tl_m in columnl, defines connectivity with line SI. 
In similar fashion, [T] elements, shown as Tn_l to Tn_m in column n, defines the 
connectivity with line Sn. 

The presence of one of the elements in columns 1 to n (i.e Tl_l to Tl_m, T2_l to 

T2_m Tn_l to Tn_m) is determined by coefficient value. Thus depending 'on 

coefficient value on lines SI to Sn, the number of [T] element in a column is 
determined. Also the number of serial adder s/subtr actor [SA7 SS] in columns is 

represented as (SA1_1 to SAl_m,SA2_l to SA2_m SAn_l to SAn_m). The 

presence of one of these elements is again defined by the coefficient value. 

In the structure, the [T] elements are arranged in shift register form. The input to 
first [T] element is connected to one of the S line. While the input to [SAJ SS] is 
connected from input SI to Sn and/or one of the output of [T] elements of shift 
register, depending on the coefficient value. Finally, using SAe_l to SAe_n-l 
elements, the addition/subtraction of [SA/SS] of all the coefficient terms depicted 
in ' columns is done. The final output is the output of last 
addrtion/subtraction[SA/SS], 

Among the lines SI to Sn, the [T] elements are not share-able and also the [SA] in 
each column are also not share-able.. Thus limited minimization is possible in this 
structure. 
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MimmizatioxrfAIready applied as~ patent) 

This structure reduces the hardware of the coefficient block [A] by having share- 
able elements in coefficients, even if the coefficient lines CLin_0, CLin_l, are 

not commonly connected. This structure reduces the area by approximately 30- 
_50%_of " Figure 7" of the drawings by reducing the number of components and by 
having share-ability of components. Here the optimization techniques are 
illustrated with examples and end of this section depicts the generalized equation 
and structure of the device. 

Continuing the same example of FIR filter and using "Equation 3" of previous 
section. 

y(nT)= 5 * SI + 14 * S2 -f- 25 * S3 + 30 * S4 

Y(nT>(4+l)Sl 4- (8+4+2)S2 + (16+8+l)S3 + (16+8+4-h2)S4 

The applicants proceed to share the shift registers (multiply by 2) of the design. 

-(S3+S4)* 1 6+(S2+S3+S4)* 8+(S 1 +S2+S4)*4+(S2+S4)*2+(S 1+S3) 

=(Sl+S3)+2*(S2+S4+2*(Sl-fS2+S4+2*(S2+S3+S4+2*(S3^S4)))) (EQ 4) 

Finding out the common additive factors 

Al = S2+S4 

A2 = S3+S4 

The "Equation 4" can be further reduced as 

y(nT) = (S1+S3)+2*(A1+2*(S1+A1+2*(S2+A2+2*A2))) (EQ 5) 

The implementation flow for this equation and the hardware implementation is 
illustrated here, also the hardware implementation in shown in "Figure 9" and 
"Figure 10" of the drawings [e.g SA(1), SA(2) etc. are used for representing 
adders, T(l), T(2) etc. are used for representing the unit delay]. In the flow of 
implementation, SI, S2, S3, S4 represents four inputs. The primary addition is 
done using serial adders SA(1), SA(3), SA(9) representing addition of terms 
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S1+S3, S2+S4, S3+S4. While the secondary and tertiary addition is done using the 
adders SA(5), SA(7), SA(8), SA(6), SA(4), SA(2). The multiplication by factor of 
two is done using the elements T(l), T(2), T(3), T(4). 



Implementation flo%v of equation 

BIT4 , BIT3 
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Implementation of hardware is shown in "Figure 9" of the drawings, wherein the 
input line SI to S4 represent the lines connected to delay block [Z" 1 ] through 
coefficient line Clin_0 to CLin_6 depicted in "Figure 6" of the drawings. The 
Lines SI to S4 axe connected to block [B] for performing the serial 
addition/subtraction, for which (SA), (SS) elements are used within blockfB]. The 
output of each block [B] is terminated with a [T] block, which represents the block 
[B] output being multiplied by a "factor of 2". The output b_l of block [B] which 
is at bit position 0 is fed to the input of the T(l), in turn the output line t_l of 
element [T(l)] is fed to next section of blockfB]. Thus all addition defines a bit 
position before getting multiplied by 2 and changing to next bit position. All [T] 
elements are represented by blockfC]. In the structure, the flip-flop [Tj 
representing multiplication by a "factor of 2", is pushed to share between various 
coefficient values. Hence reducing the number of flip-flop(T). 

In the minimization of "Figure 9" of the drawings, approximate area calculations is 
= 9 serial adder + 4 T = 22 Units, whereas the area calculation of "Figure 7" of 
the drawings is 1 1 serial adder + 13 T = 35 units, (assuming 1 Unit = 1 FA = 2HA 
= IT & serial adder = 2 Units). This resulted in 37% saving in area (13/35 * 100). 

DETAILED DESCRIPTION OF THE INVENTION 
Minimization Proposed in the present invention 

The invention reduces the area of the coefficient block [A] by having share- able 
elements in coefficients, even in the implementation where the coefficient lines 

CLin_0, CLin_l, are not commonly connected (shown as architecture [A]). This 

coefficient block [A] when applied in implementation2 ("Figure 4") of FIR filter, 
makes it still more area-efficient. This reduction is extendable to other filter based 
on coefficient block [A], as stated in the first section. 
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An area efficient implementation of filter coefficient block is done using full adder 
(FA) block instead of serial adder (SA). It is known that a serial adder consists of 
one full adder (FA) and one flip-flop (T) element, (refer "Figure 3" of the 
drawings). This makes serial adder(SA) twice expensive in area as compared to 
_one_full adder(FA) block, [area of serial adder (SA) = 1 FA+1T = 2 units while the 
area of 1FA = 1 unit]. In this implementation the reduction in area of the 
coefficient block [A] is achieved by maximising the use of full adder (FA) i.e by 
replacing serial adders (SA) with Full adders (FA) in the block [A]. 

The above is achieved by providing a device for area efficient realization of 
coefficient, said device comprising architecture [A] with hardware sharing 
techniques and optimization applied to this architecture, the architecture [A] is 

connected to coefficient lines CLin_0, CLin_l CLin_n and/or BLin_0, 

BLin_l,...JBLin_n coming from block [E] and/or [F], to be connected to perform 
filtering operation or a mathematical computing operation with optimization in 
hardware and provides a zero latency clock output, the serial input bit line of said 

architecture [A] are SI, S2, Sn. [where n represents the number of coefficients 

of the filter], the addition terms of the equation [(aO*Sl+bO*S2+.,..+kO*Sn), 

(al*Sl+bl*S2+ +kl*Sn) (am*Sl+bm*S2+ +km*Sn)] are represented 

as blocks [B], the said block [B] is a combinational block consisting of full adders 

(FA) & full subtractor (FS) elements, the values aO, bO, etc. are (+ / -1 or 0), 

the connection of elements (FA/FS) to SI, S2....Sn lines and interconnection of the 
elements (FA,FS) depend on the value of coefficients, the final output of last 
element [FA/FS] of each block [B] is terminated through lines b_l, b_2,....b_m at 
[T] elements, the number of T elements in cluster [C] depends on the size of 
maximum coefficient value and is share-able for all the coefficient in the 
coefficient architecture [A], in the said architecture all the combinational elements 
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[B] are clustered together as [D] and all the unit delay elements {T[l], 

T t 2 ] T[m]} are clustered together in [C], thereby separating the sequential and 

combinational logic, In the said architecture [A] the output of element [T] is 
connected to the one of the inputs of combinational logic of block [B] of next bit 
position, the interconnections from cluster [C] to [B] is represented as t 1, 
_t_2,.,...t_m, the elements [FA/FS] are arranged in matrix form FA0_0 to FA0_n in 
bit position 0, FA1_1 to FAl_n in bit. position 1,.... FAm_l to FAm_n in bit 
position m whose presence is defined by coefficient value, the carry-out pin of full 
adder (FA) of each cluster stage[B] in the said architecture [A] is fed to input of 
full adder (FA) of previous stage cluster [B] i.e stage preceding the flip-flop (T) 
element of cluster [C], in this way the same Flip-Flop [T] (Tl, T2, T3... Tn) is used 
for multiplication by "a factor of two" and also in the implementation of the carry 
structure in the one bit serial adder, in the said architecture some extra components 
represented as block [Ex] are being used for connecting the carryout of all the 
adders/subtractors [FA/FS] of last stage of p], the element [FA/FS] and [T] are 
used within this block, and hence, the said architecture [A] structures the circuit 
into sequential block [C] consisting of [T] elements and combinational [D] 
consisting of [FA,FS] elements, while the [T] elements of block [C], are common 
for all the coefficients and are share-able and positioned at end position of each 
block [B], the Block [D] has combinational element block[B] which are essentially 
[FA,FS], thereby making share-able hardware within block [D] and the final output 
is the output of BITm position. 

In the present device, preferably, the area minimal realization of digital filters 
based on coefficient architecture [A] is achieved when it is operated in bit serial 
fashion. The structure provides hardware minimization for finite impulse 
response(FER) filter, infinite impulse response filter(HR) and for other filters and 
applications related to combinational logic- consisting of delay element(T), 
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multipIier(M), adder and subtracter. Further optimization technique in cluster [D] 
is done by using common adders (FA) and common subtracter (FS) and using this 
shared outputs or by using subtracter (FS) instead of adders, when the coefficient 
value is closer to power of two or by minimizing the use of subtracter by taking 
common subtraction operator and using adder instead. 

The present device, when used in implementation 2 of FIR/TER filter and similar 
structure of filters, results in quite area efficient realization of the filter, the storage 
area in implementation!, referred as delay elements [ZT 1 ], is smaller as compared 
to implementation 1 which is present due to inherent property of the structure of 
implementation 2, and an additional saving in area in filter coefficient realization 
design is achieved by using the claimed structure of coefficient architecture [A] of 
"Figure 12". 

In the implementation flow explained under, the carry-out (COUT) pin of full 
adder (FA) of each stage is fed to (CIN) input of full adder (FA) of previous stage 
i.e stage preceding the flip-flop (T) element. In this way, Flip- Flop (Tl, T2, T3, 
T4) which is used for multiplication by two, is used again, to function as carry 
storage and to enable [FA] to perform as one bit serial adder. 

Rewriting the equation of FIR filter for the example shown in previous section 
y(nT) = (S1+S3)+2*(A1+2*(S1+A1+2*(S2+A2+2*A2))) (EQ 6) 

- Using full adder (FA) component in "Equation 6 n , it is seen that the number of full 
adders used are the same as the number of one-bit serial adders used in the earlier 
architecture . In the proposed patent, depending on carryout of BITO position, 
some half adders or some extra elements are present. 
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lion flow of equation 

BIT3 ; BIT2 , BIT| 

2 * (S2+S4+2* (Sl+S2+S4+ ( 2" (S2+S3+S4-h' 



Bill) 
2 * (S3+S4 ) ) ) ) 




As shown in the above implementation flow, the equation defines the bit position 
as BITO to BIT4, which is the position of "multiplication by power of two", (e.g 
BITO represents multiplication by 20). At BITO position addition of S3+S4 is 
performed and the output is terminated at T(l). The output of T(l) defines the next 
bit position BIT1, which performs addition of S2+S3+S4 using the [FA] and also 
the output of T(l). The output of this addition is again terminated at T(2). The 
structure is repeated in next BIT positions. The carryout of [FA]'s are fed to the 
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previous bit position. The final addition of BIT position BIT4 gives the output of 
the coefficient block [A]. 

The implemented structure is shown in "Figure 11", wherein the input line SI to 
S4 represent lines connected to delay block [Z* 1 ] through coefficient line Clin_0 to 
_CLin_6 depicted in "Figure 6 M of the drawings. The Lines SI to S4 are connected 
to block [B] for performing the serial addition/subtraction for which (FA), (FS) 
elements are used within block[B]. All [T] elements are represented by block[C]. 

The adders at all the bit positions [B], represented by FA(1), HA(2), FA(10) are 

clustered in [D]. The adder [FA]'s inputs is connected from coefficient lines SI, 
S2, S3, S4 and from unit delay element of previous bit position. The 
addition/subtraction is performed in [B] block and the final output of last adder 
[FA] is connected to [T] elements, which is used for "multiplication by factor of 
T\ The interconnection from [B] block to [T] block is represented as b_l, b_2, 
b_3, b__4. The outputs of [T] are connected to one of the inputs of combinational 
logic of block [B] of next bit position (i.e connected to input of first element (FA) 
of block [B]. These interconnections of [T] from cluster [C] to [B] is represented 
as t_l, t_2, t_3, t_ 4 and Bit positions are marked as BITO, BIT1, BIT2, BIT3, 
BIT4. An example illustration of connectivity is explained here. The output b_l of 
block [B] which is at bit position BITO is fed to the input of the T(l), in turn the 
output line t_l of element [T(l)] is fed to next section of blockjTB]. Thus ail 
addition defines a bit position before getting "multiplied by factor of 2" and 
" changing to next bit position. 

The connection of COUT (carryout) of all the [FA] of one stage is explained here. 
The connection of carry-out (COUT) pin of full adder (FA) of each cluster 
stage[B] is fed to one of the inputs of full adder (FA) of previous stage cluster [B] 
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i.e stage preceding the flip-flop [T] element of cluster [C]. Thus utilizing the [T] 
element of that bit position again. This enable using the [T] element for carry 
storage, by all [FA]'s element in that bit position, during serial addition operation. 

In the invention, the flip-flop [T] is used for dual purpose 

J) Multiplication of output of block [B] by factor of two, used by all coefficients. 
2) Utilizing the same [T] elements commonly by block [B] for using with [FA] to 
enable it to perform as a serial adder (SA). 

For example, at bit position 3, the HA(2) performs addition data on S2, S4 lines 
The output Z, represents shared adder Al, being fed to FA(3) and FA(5). The 
output Z of FA(3) defines bit position 3 is terminated at [T(4)] element. The Cout 
pins at this bit position is connected to Cin of any adder [FA(5)] in previous bit 
location, hence utilizing the [T(4)] element to enable all the FA's at this location to 
work as a [SA]. The structure of [SA] is essentially [FA] along with [T] element 
connecting the Cout of FA to it's Cin pin. 

In this implementation, there are some extra elements such as FA(ll) and Te(2), 
Te(l) which are required to terminate the carry out(Cout) at the bit position 0. The 
number of [FA] elements is equal to the (number of Cout lines-1) in Bit position 0 
and the number of [T] elements is equal to the (number of Cout lines) in Bit 
position 0. The extra elements are represented as [Ex] block. 

The-circuit is structured into sequential block [C] consisting of [T] elements and 
combinational Block [D] consisting of FAJFS elements. 

a) Block [C] having sequential elements is common for all the coefficients and 
have share-able elements [T] positioned at end position of each block [B]. 
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b) Block [D] having combinational block[B] which are essentially FA,FS. Not 
only the hardware within block[B] are share-able but also across various [B] 
blocks. Hence the components hardware within [D] block is minimized. 

The minimization in block[D] is achieved by using following minimization 
techniques 

1) Sharing of common adder term, i.e. utilizing the common adder multiple times. 

2) Using subtraction instead of addition when the coefficient is close to power of 



2e.g 63 is better realized as (64-1) than (32+16+8+4-5-2+1). In former case the 
number of subtractor is 1 as compared to 5 adders in latter case. 
3) Taking common subtraction operation and maximizing the use of adder are 
applied. This is because subtraction is expensive as compared to addition operation 

In present minimization of "Figure 11", approximate area calculations is done as [9 
FA + 2 HA + 6 T = 16 Units]. As the applicants have seen that the area in 
minimization under section "The Existing Method and Minimization" and 
"Figure 7 n is 35 units. Area in minimization under section "Minimization (Already 
applied as patent)" ("Figure 9") is 22 units. Current miriimization is an 
improvement of 54% {(35-16)/35} & 27% {(22-16)/22} of coefficient block 
respectively over the two structures, (assuming 1 Unit = 1 FA = 2HA = IT & serial 
adder = 2 Units) 

GENERALIZED STRUCTURE OF THE INVENTION 

• The-invention provides an area efficient realization of filter coefficient block[A] 
applicable to filters devices such as FIR, IIR and other filter structures based on 
this block. This architecture is also applicable to combinational and sequential 
logic consisting of adder, subtracters, multipliers and flip flop [T]. This 
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architecture is realized using the elements full adders (FA), full subtraction (FS) 
and flip-flop[T]. 

Beginning with the generalized equation of FIR filter coefficient block(A) 

y(nT) = a* SI +b* S2 + c* S3+ k * Sn (1) 

_where a, b,....k represents filter coefficients. SI, S2 represents bit lines 

corresponding to the coefficients. 

Now, representing each coefficient as addition of terms arranged in power of two 
and applying it to the equation. 

y(nT) = (2 m *am + 2 1 *al+2°*a0) * SI -f (2 m *bm + 2 1+ bl+2°*b0) * S2 + 

(2 m *cm + 2 1 *cl+2°*c0) * S3+ +(2 m *km + .2 1 *kl+2°*k0) * Sn 

Further taking "2" as common factor we get the generalized equation for 

architecture under claim as. 

Y(nT) = (aO*S 1 +bO*S2+....+kO*Sn) 

+ 2 ! ( (al*Sl+bl*S2+....+kl*Sn) + 

2 1 ((a2*Sl+b2*S2+...+k2*Sn)+ 

2 1 ((a3*Sl+b3*S2+...+k3*Sn)+ + 

2 l ((am*Sl+bm*S2+ +km*Sn))))) 

where aO, al, am and bO, bl,...bm and kO, kl, km represents the sign of 

coefficients [i.e they have value (+ / -) 1 or 0]. The architecture realization in 
"Figure 12" is done using the sequential elements like unit delays [T] and 
combinational elements such as full adder (FA) and full subtractor (FS). 

In "Figure 12", the input data is present on bit line SI, S2, Sn. [where n 

represents the number of coefficients of the filter] The addition terms of the 

equation.[(aO*Sl+bO*S2+....+kO*Sn),(al*Sl+bl*S2+ +kl*Sn) (am*Sl+b 

m * S2+ +km*Sn)] are represented as blocks [B]. Block [B] is a combinational 
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block consisting of full adders (FA) and full subtracter (FS) elements. Since the 

values aO, bO, etc. represents value [(+ / -)l or 0]. The connection of elements 

(FA/FS) to SI, S2....Sn lines and interconnection of the elements (FA,FS) depend 
on the value of coefficients [This is because the value of coefficient determines the 
value of aO, al, etc. and hence it defines the interconnections between them]. 

All the addition/subtraction operation at a bit location is performed in block [B] 
and the output of each block [B] is terminated at [T] elements, which are 
essentially used to multiply the block [B] output by "a factor of two" and passing 

the output to next bit position. {The elements T[l], T[2], T[m] are used for 

this}. The connections (b_l, b_2,....b_m) are used for termination of output of 
block [B]. The bit positions of serial data frame are marked as BITO, 

BIT1, BITm. The number of [T] elements depends on the size of maximum 

coefficient and is share-able for all the coefficient in the coefficient block [A], 
Also all the elements [B] are clustered together as [D] and all the unit delay 

elements {T[l], T[2] T[m]} are clustered together in [C]. Thus separating the 

sequential and combinational logic. The input of the unit delay element [T] is final 
output of block [B] and the output of element [T] is connected to the one of the 
inputs of combinational logic of block [B] of next bit position (i.e connected to 
input of first element (FA or FS) of block [B] depending upon the sign value*/-). 
The interconnections from cluster [C] to [B] is represented as t_l, t_2, t_m. 

Thus, the [T] elements clustered as [C] is share-able for all the coefficients and the 
full~adder/subtractor (FA/FS) components are clustered as [D]. The carry- out pin 
of full adder (FA) of each cluster stage[B] is fed to input of full adder (FA) of 
previous stage cluster [B] i.e stage preceding the flip-flop (T) element of cluster 
[C]. In this way, we will share the same Fiip-Flop [T] which is used for 
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multiplication by factor of two (Tl, T2, T3... Tn) to the implementation of the 
carry structure in the one bit serial adder. 

Extra components represented as [Ex] block are used for connecting the carry-out 
of all the adders/subtractors (FA/FS) of last stage of [D] i.e bit position BITO. Full 
adders/full subtractor[FA/FS] and unit delays [T] are used in this block. The line 
COUT (carryout) of bit position BITO is connected to [Ex] block (typically to 
inputs of element such as [FA] or [FS]}. Now using a [T] element, the carryout 
(COUT) of each one of [FA/FS] is fed to the CIN of the same element. Also, for 
connection of Z of [FA]'s to the input A or B of next [FA] element. A binary tree 
can be formed here. The number of [FA], [T] elements in [Ex] block are [number 
of carryout pins -1] and [number of carryout pins] respectively. 

In the invention, optimizations in hardware in both cluster [C] and [D] is achieved 
with the reduced unit delays [T] and the adder/subtractor area (FA,FS). The gain 
in hardware is explained below. 

Hardware reduction in block [C] 

For filter having large size coefficient, this structure reduces the area of coefficient 
block [A] [by 50-75% of the area of coefficient block [A]). 

Before beginning to prove the statement, the calculation of elements is formularize 
for 

- 1) number of flip-flop (T) 
2) number of serial adders (S A) or full adders (FA) 

This comparison is done here. The generalized structure of "The Existing Method 
and Minimization" in illustrated in "Figure 8". The other structure for comparison 
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are "Minimization (Already applied as patent)" in "Figure 10" and "Generalized 
structure of the invention" in "Figure 12" of the drawings. 

1) The number of flip-flops [T] elements in the coefficient block depends on the 
size of all the coefficients. The approximate and pessimistic formula for 
calculation of total flip-flops (T) in coefficient block in "The Existing Method and 
Minimization" is [= average size of coefficient * number of coefficient] 
("Figure 8") , where average size of coefficient is calculated pessimistically as 
(Maximum coefficient size / 2). While in the "Minimization (Already applied as 
patent)" and "Proposed Minimization (Proposal for Patent)" , the number of [T] 
elements are [= maximum size of coefficient, since the flip-flops (T) are share-able 
here]. 

2) The approximate formula for calculation of total adders (SA) in coefficient 
block for the mentioned above cases is [=adders per coefficient * number of 
coefficient]. Adders per coefficient block solely depend on value of coefficient. 
Assuming no optimization in worst case, number of adders per coefficient is 
(=number of coefficient * maximum coefficient size / 2). 

Now using the mentioned formula on an example filter having 20 coefficient. The 
coefficient having maximum value is in 16 bits (e.g. maximum coefficient value is 
+32767 or -32768 in 2's complement representation). In the present example, 
average size of the coefficient approximated by the formula is 8 bit. For "The 
Existing Method and Minimization", total number of flip-flop (T) required for 
implementation is 8 * 20 = 160. In contrast to this, "Minimization (Already applied 
as patent)'* and "Proposed Minimization (Proposal for Patent)" would require 
only 16 Flip-Flops (The number of flip- flops of all the coefficients are share-able 
and are limited to the coefficient which has the maximum value). Using the 
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formula for adder's calculation, the number of adders for three cases are 8 * 20 = 
160 (approx.). 

Area calculation for "The Existing Method and Minimization" is 160 T +160 SA 
= 480 units. Area calculation for "Minimization (Already applied as patent)" is 16 
T +160 SA = 336. Area calculation for "Proposed Minimization (Proposal for 
Patent)" is 16 T +160 FA +(extra elements 8T+7 FA)=191. [Assuming that 
average number of full adder per bit position is 8. We will generalize the 
calculation of number of extra elements here. These extra elements are needed to 
terminate the carry-out of last (LSB) position. Thus if the average number of FA's 
is 8, the extra elements (7 FA, 8T) are needed to terminate the carry-outs' of LSB 
position. This is shown in "Figure 12"]. Thus we see that current proposal has an 
area improvement of approximately by 60% {(480-190)/480) of coefficient block 
over "The Existing Method and Minimization" . 

Hardware reduction in block [D] 

For hardware reduction in block [D], following minimization are applied. 

1) sharing of common adder term and using it in block [D] 

2) using subtraction instead of addition when the coefficient is close to power of 2 
e.g 63 is better realized as (64-1) than (32+16+8+4+2+1) 

3) Taking common subtraction operation and maximizing the use of adder 

For approximate area calculation following assumption is made (1 Unit of Area = 
1 FA = 2 HA = IT & SA=SS= 2 Units of Area). 

Advantages involved in the present invention 

The Area gets reduced by 50-75% (of the coefficient block[A]) for big filter 
structures, if all the 3 optimization steps, as discussed in previous section 
"Hardware reduction in block [D]'\ are applied. 
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The last proposed architecture ("Figure 12") is a proper Mealy type machine. 
Many a times, the output has to be converted back to parallel data format. In that 
case, the outputs from the same shift registers can be used ("Figure 13"). 
One bit-serial multipliers could be "still multiplexed for the proposed architecture if 
the specifications permit (i. e. if the frequency of operation is not very high.) 
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Claims: 

1 . A device for area efficient realization of coefficient, said device comprising 
architecture [A] with hardware sharing techniques and optimization applied to 
this architecture, the architecture [A] is connected to coefficient lines CLin_0, 
. CLin - 1 CLin - n and/or BLin_0, BLin_l,....BLin_n coming from block 
[E] and/or [F], to be connected to perform filtering operation or a mathematical 
computing operation with optimization in hardware and provides a zero 
latency clock output, the serial input bit line of said architecture [A] are SI, 
S2 ' Sn - t where n. represents the number of coefficients of the filter], the 
addition terms of the equation [(aO*Sl+bO*S2+....-fkO*Sn), 

(al*SH%l*S2+ + kl*Sn) (am*Sl + bm*S2 + + km*Sn)] are represented 

as blocks [B], the said block [B] is a combinational block consisting of full adders 

(FA) & full subtracter (FS) elements, the values aO, bO, etc. are (+ / -1 or 0), 

the connection of elements (FA/FS) to SI, S2....Sn lines and interconnection of the' 
elements (FA,FS) depend on the value of coefficients, the final output of last 
element [FA/FS] of each block [B] is terminated through lines b_l, b_2,....b_m at 
[T] elements, the number of T elements in cluster [C] depends on the size of 
maximum coefficient value and is share-able for all the coefficient in the 
coefficient architecture [A], in the said architecture all the combinational elements 
[B], are clustered together as P] and all the unit delay elements {T[l], 
T[2] T[m]} are ch *stered together in [C], thereby separating the sequential and 
combinational logic, In the said architecture [A] the output of element [T] is 
connected to the one of the inputs of combinational logic of block [B] of next bit 
position, the interconnections from cluster [C] to [B] is represented as t_l, 
- 2 ' t - m ' the ele ^ents [FA/FS] are arranged in matrix form FAO_0 to FA0_n in 
bit position 0, FAM to FAl_n in bit position 1,.... FAm_l to FAm_n in bit 
position m whose presence is defined by coefficient value, the carry-out pin of full 
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adder (FA) of each cluster stage[B] in the said architecture [A] is fed to input of 
full adder (FA) of previous stage cluster [B] i.e stage, preceding the flip-flop (T) 
element of cluster [C], in this way the same Flip-Flop [T] (Tl, T2, T3... Tn) is used 
for multiplication by "a factor of two" and also in the implementation of the carry 
structure in the one bit serial adder, in the said architecture some extra components 
represented as block [Ex] are being used for connecting the carryout of all the 
adders/subtractors [FA/FS] of last stage of [D], the element [FA/FS] and [T] are 
used within this block, and hence, the said architecture [A] structures the circuit 
into sequential block [C] consisting of [T] elements and combinational [D] 
consisting of [FA,FS] elements, while the [T] elements of block [C], are common 
for all the coefficients and are share-able and positioned at end position of each 
block [B], the Block [D] has combinational element block[B] which are essentially 
[FA,FS], thereby making share-able hardware within block [D] and the final output 
is the output of BITm position. 

2. The device as claimed in claim 1 wherein provides the area minimal 
realization of digital filters based on coefficient architecture [A], when operated in 
bit serial fashion, the device provides hardware minimization for finite impulse 
response(FIR) filter, infinite impulse response filter(IIR) and for other filters and 
applications related to combinational logic consisting of delay element(T), 
multiplier(M), adder and subtractor. 

3. The device as claimed in claim 1 wherein further optimization technique in 
cluster [D] is done by using common adders (FA) and common subtractor (FS) and 
using this shared outputs. 
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4. The device as claimed in claim! wherein further optimization technique in 
cluster [D] is done by using subtractor (FS) instead of adders, when the coefficient 
value is closer to power of two. 

5. The device as claimed in claim! wherein further optimization technique in 
cluster [D] is done by minimizing the use of subtractor by taking common 
subtraction operator and using adder instead. 

6. The device as claimed in one of the previous claims (1-5) wherein when 
used in implementation 2 of FIR/EDR. filter and similar structure of filters, results in 
quite area efficient realization of the filter, the storage area in implementation, 
referred as delay elements [Z" 1 ], is smaller as compared to implementation 1 which 
is present due to inherent property of the structure of implementation 2, and an 
additional saving in area in filter coefficient realization design is achieved by using 
the claimed structure of coefficient architecture [A] of "Figure 12". 
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FIGURE 1 . Field of invention. 
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Other Application (Combination, sequential logic minimizationt 

— t ~r j ] r n 

<g) qb -k|d Block i (A) 

I ▼ . . .T ▼ ^ 1 

u jAn^logi£^ompnsing_of dements/operations s ho wn^here 

FIR/IIR filter equation 

Y(z) = X(z) [c(0) + c(l) 2" 1 + c(2) T 1 + c(3) Z" 2 + + c(n) Z" n ] ....FIR Eq 

Y(z) = X(z) [c(Q) + c(l) Z-U c(2) Z' 2 4- c(3) T 2 + + c(n) Z' n ] ^ 

[i - (b(i) r l + b(2) r 2 + b(3) r 2 + + b(m) z* m )] 

where X(z)-input signal, Z" 1 * X(z) - delayed signal by one delay, Y(z)-output signal 
c(0), c(l), c(2) c(n), b(l), b(2) b(m) are integer coefficients values. 
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FIGURE 2. Bit Serial Elements/components 
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FIGURE 3. Explanation about components used 
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FIGURE 4. Bit Serial Implementation of FIR Filter 
Implementation I 
Input 



CLin n 



CLir 2 



'BfocK Ok)' 



Coefficient blockX 



CLir I 



CLh 0 



|_ ^XM)^ ^ — • ® c < 2 >(M) c(l) (g) c(0) (M) , 

C.l r>vk n i r*r ^ ~T. — — — — - I ~ - J 



|_Block (E) 


r 1 


▼ 


z- 1 


— 

*{sa)-> 


z- 1 


~ T ~ 


Z" 1 


_ > _ 



Realization of coefficient using share-able multiplier fcoeff. rim 



Input 



> T 



Input * 3 



i 

Input * 1 1 



Implementation 2 



BIocIT(E] 
Input 



I 



Z* 1 



CLinLO CLin 



r-1 



! CLin 



CLinn 



c(n) (M 



— @ — K§> 



u Blpck (A) Coefficient block 



SUBSTITUTE SHEET (RULE 26) 



09/807500 

PCT/SG 98/00082 



6/12 



FIGURE 5. Example FIR Filter 
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FIGURE 7. The "Existing Implementation" of the Coefficient Block [A] 
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FIGURE 9. Minimization (Already applied as patent) 
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FIGURE 10. Generalized structure "Minimization already applied as patent" 
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FIGURE 11. Minimizations in "proposal for PATENT 
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FIGURE 13. MSBs of the Parallel output are directly available 
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1. The applicant is hereby notified that the priority claim made in the international application has been withdrawn in 
accordance with a notice of withdrawal received from the applicant on: 

16 February 1999(16.02.99^1 



The attention of the applicant is drawn to the fact that the withdrawal of the priority claim will result in the re-calculation of time 
limits which have not already expired (see Rule 906w.3(d)). 

2. Q In the case where multiple priorities have been claimed, the above action relates to the following priority claim(s): 



3. A copy of this notification has been sent to the receiving Office and to: 

^ the International Searching Authority (where the international search report has not yet been issued) 
^ the designated Offices (which have already been notified of the receipt of the record copy) 
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The applicant is hereby notified of the following in respect of the priority claim(s) made in the international application. 

1 . Q Correction of priority claim. In accordance with the applicant's notice received on: , 
the following priority claim has been corrected to read as follows: 



I even though the indication of the number of the earlier application is missing. 

^\ even though the following indication in the priority claim is not the same as the corresponding indication appearing 
in the priority document: 

2. Q Addition of priority claim. In accordance with the applicant's notice received on: , 

the following priority claim has been added: 

I | even though the indication of the number of the earlier application is missing. 

| | even though the following indication in the priority claim is not the same as the corresponding indication appearing 
in the priority document: 

3. | I As a result of the correction and/or addition of (a) priority claim(s) under items 1 and/or 2, the (earliest) priority date is: 

4. [3 Priority claim considered not to have been made. 

| | The applicant failed to respond to the Invitation under Rule 26bis.2(a) (Form PCT/IB/316) within the prescribed time limit. 

I | The applicant's notice was received after the expiration of the prescribed time limit under Rule 26bis.1 (a). 

The applicant's notice failed to correct the priority claim so as to comply with the requirements of Rule 4.10. 

The applicant may, before the technical preparations for international publication have been completed and subject to the 
payment of a fee, request the International Bureau to publish, together with the international application, information 
concerning the priority claim. See Rule 26bis.2(c) and the PCT Applicant's Guide, Volume I, Annex B2(IB). 

5. Q In case where multiple priorities have been claimed, the above item(s) relate to the following priority claim(s): 



6. A copy of this notification has been sent to the receiving Office and 

3 to the International Searching Authority (where the international search report has not yet been issued). 
3 the designated Offices (which have already been notified of the receipt of the record copy). 
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34, chemin des Colombettes 
121 1 Geneva 20, Switzerland 

Facsimile No. (41-22) 740.14.35 


Authorized officer 
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Form PCT/IB/318 (July 1998) 00249761 1 



\TENT COOPERATION TRE Y 



PCT/SG98/00082 



From the INTERNATIONAL BUREAU 



PCT 

NOTIFICATION OF ELECTION 

(PCT Rule 61.2) 


To: 

Assistant Commissioner for Patents 
United States Patent and Trademark 
Office 
Box PCT 

Washinaton D C 20231 
ETATS-UNIS D'AMERIQUE 

in its capacity as elected Office 


Date of mailing (day/mo nth/year) 
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1. The designated Office is hereby notified of its election made: 

| X| in the demand filed with the International Preliminary Examining Authority on: 
25 April 2000 (25.04.00) 

| | in a notice effecting later election filed with the International Bureau on: 



2. The election | X| 

□ 



was not 



made before the expiration of 19 months from the priority date or, where Rule 32 applies, within the time limit under 
Rule 32.2(b). 



The International Bureau of WIPO 


Authorized officer 


34, chemin des Colombettes 


S. Mafia 


121 1 Geneva 20, Switzerland 


Facsimile No.: (41-22) 740.14.35 


Telephone No.: (41-22) 338.83.38 
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STMICROELECTRONICS (PTE) LTD et al. 





1. This international preliminary examination report has been prepared by this International Preliminary Examining Authority 
and is transmitted to the applicant according to Article 36. 



2. This REPORT consists of a total of 12 sheets, including this cover sheet. 

S This report is also accompanied by ANNEXES, i.e. sheets of the description, claims and/or drawings which have 
been amended and are the basis for this report and/or sheets containing rectifications made before this Authority 
(see Rule 70.16 and Section 607 of the Administrative Instructions under the PCT). 

These annexes consist of a total of 1 2 sheets. 




3. This report contains indications relating to the following items: 



II 
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III 
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IV 
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V 




VI 


□ 


VII 




VIII 





Basis of the report 
Priority 

Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 
Lack of unity of invention 

Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations suporting such statement 

Certain documents cited 

Certain defects in the international application 

Certain observations on the international application 



Date of submission of the demand 



25/04/2000 



Date of completion of this report 



06.02.2001 



Name and mailing address of the international 
preliminary examining authority: 
European Patent Office 

D-80298 Munich 
Tel. +49 89 2399 - 0 Tx: 523656 epmu d 

Fax: +49 89 2399 - 4465 



Authorized officer 
Naumann, O 

Telephone No. +49 89 2399 7468 
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INTERNATIONAL PRELIMINARY 

EXAMINATION REPORT International application No. PCT/SG98/00082 



I. Basis of the report 

1 . This report has been drawn on the basis of (substitute sheets which have been furnished to the receiving Office in 
response to an invitation under Articie 14 are referred to in this report as "originally filed" and are not annexed to 
the report since they do not contain amendments (Rules 70. 16 and 70.17).): 
Description, pages: 



1-8,10-14,16,17, 
19-21,24 



as originally filed 



9,15,18,22,23 



as received on 



28/12/2000 with letter of 



18/12/2000 



Claims, No.: 

1-7 



as received on 



28/12/2000 with letter of 



18/12/2000 



Drawings, sheets: 

1/12-7/12,12/12 as originally filed 

8/12-1 1/12 as received on 



28/12/2000 with letter of 



18/12/2000 



2. With regard to the language, all the elements marked above were available or furnished to this Authority in the 
language in which the international application was filed, unless otherwise indicated under this item. 

These elements were available or furnished to this Authority in the following language: , which is: 

□ the language of a translation furnished for the purposes of the international search (under Rule 23.1 (b)). 

□ the language of publication of the international application (under Rule 48.3(b)). 

□ the language of a translation furnished for the purposes of international preliminary examination (under Rule 
55.2 and/or 55.3). 

3. With regard to any nucleotide and/or amino acid sequence disclosed in the international application, the 
international preliminary examination was carried out on the basis of the sequence listing: 

□ contained in the international application in written form. 

□ filed together with the international application in computer readable form. 

□ furnished subsequently to this Authority in written form. 

□ furnished subsequently to this Authority in computer readable form. 

□ The statement that the subsequently furnished written sequence listing does not go beyond the disclosure in 
the international application as filed has been furnished. 

□ The statement that the information recorded in computer readable form is identical to the written sequence 
listing has been furnished. 
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4. The amendments have resulted in the cancellation of: 

□ the description, pages: 

□ the claims, Nos.: 

□ the drawings, sheets: 

5. □ This report has been established as if (some of) the amendments had not been made, since they have been 

considered to go beyond the disclosure as filed (Rule 70.2(c)): 

(Any replacement sheet containing such amendments must be referred to under item 1 and annexed to this 
report.) 



6. Additional observations, if necessary: 



V. Reasoned statement under Article 35(2) with regard to novelty, inventive step or industrial applicability; 
citations and explanations supporting such statement 

1. Statement 



Novelty (N) 


Yes: 


Claims 


1-6 




No: 


Claims 


7 


Inventive step (IS) 


Yes: 


Claims 


1-6 




No: 


Claims 


7 


Industrial applicability (IA) 


Yes: 


Claims 


1-7 




No: 


Claims 





2. Citations and explanations 
see separate sheet 

VII. Certain defects in the international application 

The following defects in the form or contents of the international application have been noted: 
see separate sheet 



VIII. Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 
see separate sheet 
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Re Item I 

Basis of the report 

The requirements of Art. 34(2) are not met by the new claim 7 since the subject-matter 
of this claim extends beyond the content of the application as originally filed (Art. 19 
PCT). 

In the new claim 7, the "combinational-sequential logic block" is defined to be "adapted 
to receive n filter transfer function coefficients". Throughout the originally filed 
application, filter coefficients are implemented by hard-wiring connections between 
adders and/or subtractors and timing modules. The subject-matter of a later 
modification of filter coefficients is not contained in the application as originally filed. 
The expression "predetermined transfer function" cannot serve to remove this 
deficiency, since no information is given when or by whom the transfer function is 
predetermined. 

Since the new claim 7 does not fulfill the requirements of Art. 32(2) PCT, it would not be 
necessary to give an opinion as regards Rule 66.2(a)(ii). However, on the basis of the 
originally filed application, it can be deduced that the probable intention of the applicant 
was not to write "adapted to receive n filter transfer function coefficients" but rather 
"implements n filter transfer function coefficients". As a service to the applicant a short 
evaluation on claim 7 in this form is given as regards novelty under Item V. 

It is noted, that the expression "transfer function", occurring not only in the new claim 7, 
but also in the new claim 1 , is not used in the application as originally filed. However, 
since the expression "transfer function" is always used in the claims in conjunction with 
a filter, the text concerning filter coefficients and the filter function on pp. 4 and 6 can 
serve as a basis for this amendment. 

Basis for major changes of the amended claims in the application as originally filed: 
Claim 1: original claim 1 and Fig. 12 
Claim 5: Fig. 4 
Claim 6: p. 12, lines 1,2 

Claim 7: Figs. 11, 12, 13 in conjunction with p. 15 and p. 12, lines 1 and 2 (but 
see comments above) 
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Re Item V 

Reasoned statement under Rule 66.2(a)(ii) with regard to novelty, inventive step 
or industrial applicability; citations and explanations supporting such statement 

Reference is made to the following documents cited in the Search Report: 

D2: WO 94 23493 A (SARAMAEKI TAPIO ;RITONIEMI TAPANI (Fl); EEROLA 
VILLE (Fl); HUSU Tl) 13 October 1994 

Novel and inventive subject-matter 

Independent claim 1 

Despite numerous deficiencies as regards clarity (see observations under Re Item VIII 
herein below), the intended subject-matter of the claim can now be identified somewhat 
better than in the previous independent claim 1 . 

Closest prior art: D2, showing a filter device with combinational logic (52 to 57, 61, 62, 
63) and sequential logic (58 to 63). 

Difference: Wiring of the blocks as defined in lines 20 to 22 and lines 27 to 30, in 
conjunction with the spatial separation as defined in lines 17 to 19 and lines 3 to 7 on 
claims page 26. 

Problem and solution: Reduction of size, number of components and critical path 
length. The use of delay elements as multipliers by two is also taught in D2, however, 
only in conjunction with the establishment of the establishment of the coefficients via 
bit-shift register 52. The solution of the invention can be viewed as combining the bit- 
shift register (51) and delays (61 , 62, 63), which is achieved by sharing of the delays for 
the multiplication and the carry-over. As a consequence, there are numerous structural 
differences in comparison with the closest prior art. Since no indication can be found in 
the available prior art that would lead a person skilled in the art to modify the closest 
prior art to arrive at the solution of the invention, an inventive step in the sense of Art. 
33 (3) PCT has to be recognised. 

Claims 2 to 6 

Subject to the clarity observations on some of the dependent claims mentioned under 
Re VIII herein below, the claims 2 to 6 also fulfill the requirements of Art. 33 (1) PCT. 
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However, only claim 2 does so because it is a dependent claim. Claims 3 to 6 are in 
fact hidden independent claims, since in each of them features of the referenced claim 
1 are replaced (use of "instead"). Yet, the replacements all concern equivalents of non- 
essential features of the invention (subtractors instead of adders under certain 
circumstances, reduction of the number of subtractors, sequential adders/subtractors 
instead of full ones), so that these claims also fulfill the requirements of Art. 33 PCT. 

Claim 7 

Claim 7 in its present form contains subject-matter added to the originally filed 
application. However, as a service to the applicant the following analysis is provided in 
which the expression "adapted to receive n filter transfer function coefficients" is 
deemed to be replaced by "implements n filter transfer function coefficients". 

As far as can be understood from the description (cf. Re. VIII herein below), claim 7 in 
the supposed form does not fulfill the requirements of Art. 33 PCT (1) because it is not 
novel. The filter geometry in document D2 comprises (see Fig. 4 and related text 
passages): a logic block (whole circuit), a combinational-sequential-logic block (52 to 
57, 61, 62, 63), a sequential logic block (58 to 63) with the properties and connected as 
specified in claim 7. It appears that the two properties defined in the new claim 1 

"the output of each delay element is connected to one of the inputs of one of the 
FA or FS elements of a respective block corresponding to a next bit position" 
(lines 20 to 22) 

and 

"wherein the carry-out pin of each FA or FS element of each block is fed to the 
input of a FA or FS element of a previous block such that the same delay element 
is used for multiplication by a factor of two and also for the carry structure in a one 
bit serial adder function" (lines 27 to 30) 

should have been included in claim 7 to avoid any conflict with the requirement of Art. 

33 PCT as regards novelty. 

Re Item VIII 

Certain observations on the international application 
Clarity deficiencies 

Although the new claim 1 is much clearer than the original claim 1 , numerous clarity 
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deficiencies remain. Therefore claim 1 does not meet the requirements of Article 6 
PCT. The following deficiencies have been observed: 

• Several technical features are introduced to with a generic name in combination 
with a letter in square brackets: "logic architecture [A]", "delay blocks [E] and/or 
[F]", "combinational logic blocks [B]", "delay elements [T]", "logic block [C]", "block 
[D]", "block [Ex]". Later these features are then often referred to differently, as, 
e.g., "block [B]", "combinational block [D]", "block [C]", "sequential logic block [C]" . 
This results in conflicts as regards a proper assignment of the features in the 
claim and also as regards the references in the figures (Rule 6.2(b) PCT). Similar 
problems arise as regards the "coefficient lines CLin_0, CLin_1, CLin_n [...]", 
the "serial input bit lines" and the generic "lines b_1 , b_2, b_m". In order to 
avoid these conflicts, a naming convention as the following would have been more 
appropriate: 

- always refer to "blocks [B]" as "combination logic blocks" 

- always refer to "blocks [G]" as "sequential logic cluster" 

- always refer to "block [D]" as "combination logic cluster" 

- refer to a single "block [B]" as "a single combination logic block" 

- refer to "logic components represented as block [Ex]" as "extra logic 
components" 

- distinguish clearly between "coefficient lines", "serial input bit lines" and 
"interconnection lines". 

The expressions "(B)", "(C)", etc. could then have been used as true reference 
signs, appended to the new expressions. 

• The expression "wherein the final output of a last FA or FS element of each block 
[B] is terminated through lines b_1 , b_2, b_m at a plurality of delay elements 
within a logic block [C]" lacks clarity. It is not clear from which block [B] to where 
the lines are interconnected. 

• The expressions "next block [B]", "previous block [B]", "respective block [B] 
corresponding to a next bit position", "last stage of block [D]", etc. lack clarity 
in that no ordering of combinational logic blocks is established through the 
wording in the claim, that would allow to establish a clear meaning of the terms 
"next", "previous", etc. The vague expression "the blocks [B] are clustered 
together in series" is not sufficient, since no details on the clustering in series are 
given subsequently. 

• The expression "interconnections between blocks [C] and [B] are represented as 
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t_i , t__2,..., t_m" lacks clarity. The term "represent" does not define a technical 
feature. 

No interrelation of the "bit in the m th position" with the rest of the device is 
established. The mere statement that it should provide the final output does not 
achieve this. 

The observations as regards the naming conflicts apply also to several of the other 
claims. 

Claim 2 

The expression "the device provides hardware minimization for [filters]" lacks clarity, in 
being a functional feature, that could relate to a physical shrinking of hardware, a 
reduction in size of said filters, both performed by the device, or a reduction of the 
device size itself; in all cases the necessary means or technical features are missing, 
as well as a reference (smaller with respect to which property?). 

Claim 3 

The expression "block [D] uses an FS element" lacks clarity with respect to category. 
Claim 4 

Unclear or deficient expression: "by using a common subtraction operator and 
substituting FA elements instead". The antecedent of "instead" is not clear. 

Claim 5 

The feature "common input line" is neither defined in the claim nor in any of the 
referenced claims. 

Claim 7 

The expression "produce a transfer function output corresponding to the m th bit 
position" lacks clarity in that the correspondence is not established through the 
technical features provided in the claim: the "m th bit position" could relate both to the 
input or the output. Furthermore, the use of the naming "S1 ,...,Sn" for transfer function 
coefficients is not consistent with the description (see p. 6), probably the variables 
"cO cn" would have been more appropriate. 
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Re Item VII 

Certain defects in the international application 

The present set of claims is a considerable improvement compared to the original set of 
claims. However, practically all the deficiencies that applied to the original set of 
claims still apply to the description. Therefore, when redrafting the description, which 
without doubt will be necessary for any regional or national phase, not only the 
observations as regards the present set of claim will have to be dealt with, but also a 
thorough analysis as regards the naming convention, the logical definition and the 
reference signs will have to be undertaken. As an example the text in the paragraph 
bridging p. 12 and 13 is analysed: 

The technical field of the apparatus is not clear. The introductory expression "A 
device for area efficient realization of coefficient, said device comprising [...]" 
merely defines a generic device. The expression "for area efficient realization of 
coefficient" itself is not only grammatically incorrect but also lacks any technical 
meaning. 

• Several features appear to be merely defined in terms of their function, i.e. 
the results which are to be achieved. This relates to the following expressions 
(with further comments to additional deficiencies): 

a. "coefficient lines [...] to be connected to perform filtering operation or a 
mathematical computing operation with optimization in hardware and 
provides a zero latency output". Not only is this expression grammatically 
incorrect ("provides"), it also remains unclear whether said "optimization" is 
due to the "filtering or mathematical computing operation", whether it is a 
result of the manner in which the lines are connected, or whether the 
optimization is the area efficient realization. 

b. "elements [FA/FS] are arranged in matrix form [...] in bit position m whose 
presence is defined by coefficient value". Not only is this expression 
grammatically incorrect ("by coefficient value"), it also lacks clarity in that the 
antecedent of "whose" cannot be readily identified ("elements", "matrix", "bit 
position"), and that the term "presence" remains clouded as regards a 
physical presence of a feature or the existence of a value of a coefficient. 

c. "the same flip-flop is used for a multiplication by "a factor of two n ". Since 
the preceding definitions lack clarity, this expression constitutes a functional 
feature. It cannot be used to define properties of the interconnection of 
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adders and delays; for an improvement in readability it might be appropriate 
to suppress this expression if said interconnections and the related features 
are properly defined. 

d. "the said architecture [A] structures the circuit into [...]". In addition to the lack 
of clarity as regards the technical features by which said structuring is 
performed, the expression leaves the reader in the doubt whether the 
architecture as such performs the structuring or the architecture is 
structured. 

It appears that several paragraphs, having been drafted independently making 
use of different figures, have been merely collated (cut and paste?). As a 
consequence, different technical features are referred to with the same term, the 
same technical feature is referred to with different terms ("block [B]", "block [C]", 
"cluster [C]", "sequential block [C] n , etc.) and grammatical errors have occurred. 
Several technical features are referred to with a generic name in combination with 
a letter in square brackets (e.g. "block [E]"), in parentheses (e.g. "full adder (FA)"), 
or both, (e.g. "element (FA/FS)" and "element [FA/FS]"), or even without a name 
(e.g. "[D]"). Therefore it is not clear whether this letter refers to a feature in the 
drawings, whether it is intended as a modifier of the feature or as a name for the 
feature itself. Furthermore, several features are used without explicitly defining 
them beforehand, e.g. "filter". As a consequence, a proper assignment of the 
following expressions to specific technical features is not possible: 

e. "coefficient lines "CLin_0 [...]" 

f. "block [E]", "block [F]" 

g. "serial input bit line [...] S1 [...]" 

h. "filter" 

i. "blocks [B]", "block [B]" 

j. "elements (FA/FS)", "S1 , S2, .... Sn lines" 

k. "coefficients" 

I. "lines b_1, b_2, b„m" 

m. "[T] elements", "T elements" 

n. "cluster [C] n 

o. "size of maximum coefficient value", "all the coefficient" 
p. "coefficient architecture [A]" 

q. "all the unit delay elements {T[1], T[2],...,T[m]}", "[C]" 
r. "sequential and combinational logic" 
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s. "combinational logic of block [B]" 

t. "next bit position" 

u. "interconnections from cluster [C] to [cluster?] [B] is represented as t_1 , 
t_2,..., Lm" 

v. "FACU) to FAO_n", FA1_1 to FA1_n", "FAm_1 to FAm_n" 

w. "cluster stage [B]" 

x. "previous stage cluster [B]" 

y. "flip-flop [T] element" 

z. "Flip-Flop [T] (T1 ,T2,T3,...,Tn)" 

aa. "carry structure", "one-bit serial adder" 

bb. "block [Ex]" 

cc. "last stage of [D]" 

dd. "this block", "the circuit" 

ee. "sequential block [C]", "combinational [D] n 

ff. "combinational element block [B]" 

gg. "essentially [FA,FS]" 

hh. "block [D]", "final output" 

ii. "BITm position" 

Further deficiencies of the application 

The expressions "already applied as patent" and the expressions "proposal for 
PATENT" on page 22 (lines 22, 23) of the description are not appropriate in that they 
do not allow a clear distinction between prior art and the subject-matter of the present 
application for which protection appears to be sought. In any case, for the art referred 
to with "already applied as patent" an appropriate document should have been 
identified in the description (Rule 5.1(a) (ii) PCT). 

In several instances grammatical or typing errors occur in the description in which 
sentences start or stop unexpectedly (e.g. p. 1 1 "Hence reducing the number of flip- 
flop^).", p. 19, "we get the generalized equation for architecture under claim as."). 

The images on p. 10 and p. 15 of the description would have had to be either moved to 
the drawings section of the application or deleted, in order for the application not to 
contravene against Rule 1 1 .10 (a) PCT, according to which the description should not 
contain drawings. The corresponding text passages would have had to be adapted 
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accordingly. 

Contrary to the requirements of Rule 5.1 (a)(ii) PCT, the relevant background art 
disclosed in the document D2 is not mentioned in the description, nor is this document 
identified therein. 

Upon redrafting the description, a thorough grammatical check and a re-reading of the 
description as a whole might then be appropriate to avoid logical misunderstandings 
and avoid any deficiencies as regards lack of support by the description. 
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Minimization" 

This structure reduces the hardware of the coefficient block [A] by having share- 
able elements in coefficients, even if the coefficient lines CLin 0, CLin 1 2 - e 
not commonly connected. This structure reduces the area by approximately 30- 
_50%_ of "Figure 7" of the drawings by reducing the number of components and by 
having share-ability of components. Here the optimization techniques are 
illustrated with examples and end of this section depicts the generalized equation 
and structure of the device. v 

Continuing the same example of FIR filter and using "Equation 3" of previous 
section. 

y(nT)= 5 * SI + 14 * S2 + 25 * S3 + 30 * S4 

Y(nT)=(4+l)Sl + (8+4+2)S2 + (16+8+l)S3 + (16+S-f-4+2)S4 

The applicants proceed to share the shift registers (multiply by 2) of the design. 

=(S3+S4)*16-KS2+S3+S4)*8+(S1+S2+S4)*4+(S2+S4)*2-KS1+S3) 
=(S1+S3)+2*(S2+S4+2*(S1+S2+S4+2*(S2+S3+S4+2*CS3+S4)))) (EQ 4) 
Finding out the common additive factors 

Al = S2+S4 

■A2 = S3-J-S4 

The "Equation 4" can be further reduced as 

y(aT) = (Sl+S3)+2^(Al+2*(Sl4-Al+2*(S2+A24-2' i< A2))) C£ q 5) 

The implementation flow for this equation and the hardware implementation . : s 
- illustrated here, also the hardware Implementation in shown in "Figure 9" and 
"Figure 10" of the drawings (e.g SA(1), SA(2) etc. are used for representing 
adders, T(l), T(2) etc. are used for representing the unit delay]. In the flow of 
implementation, SI, S2, S3, S4 represents four inputs. The primary addition is 
done using serial adders SA(1), SA(3), SA(9) representing addition of terms 
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As shown in the above implementation flow, the equation defir.es the bit position 
as BITO to BIT4, which is the position of "multiplication; by power of two", (e.g 
BITO represents multiplication by 2°). At BITO position addition of S3-S4 is 
performed and the output is terminated at T(l). The output of TO) defines the next 
bit position BIT1, which, performs addition of S2+S3+S4 using the [FA] and also 
the output of T(i). The output of this addition is again terminated at T(2). The 
structure is repeated in next BIT positions. The carryout of [FAj's are fed to the 
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b) Block [D] having combinational blockfB] which are essentially FA.FS. Not 
only the hardware within blockfB] are share-able bat also across various [B] 
blocks. Hence the components hardware within [D] block is miniinized. 

The minimization in block[D] is achieved by using following rrunimization 

Jechniques 

1) Sharing of common adder term, i.e. utilizing the common adder multiple times. 

2) Using subtraction instead of addition when the coefficient is close to power of 
2e.g 63 is better realized as (64-1) than (32+16+8+4+2+1). In former* case the 
number of subtracter is 1 as compared to 5 adders in latter case. 

3) Taking common subtraction operation and maximizing the use of adder are 
applied. This is because subtraction is expensive as compared to addition operation 

In present minimization of "Figure 11", approximate area calculations is done as [9 
FA + 2 HA + 6 T = 16 Units]. As the applicants have seen that the area in 
minimization under section "The Existing Method and Minimization" and 
"Figure 7" is 35 units. Area in minimisation under section "Minimisation" ("Figure 9") is 
22 units."* Current minimisation is an improvement of 54% {(35-16)/35} &27% {(22- 
26)/22) of coefficient block respectively over the two structures, (assuming 1 Unit = 1FA 
= 2HA = IT & serial adder = 2 Units) 



GENERALIZED STRUCTURE OF THE INVENTION 

The-invention provides an area efficient realization of filter coefficient block[A] 
applicable to filters devices such as FIR, HR and other -filter structures based on 
this block. This architecture is also applicable to combinational and sequential 
logic consisting of adder, subtracters, multipliers and flip flop [T]. This 
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are "Minimisation" in "Figure 10" and "Generalized structure of the invention" in "Figure 
12" of the drawings. 

1) The number of flip-flops [TJ elements in the coefficient block depends on the size 
of all the coefficients. The approximate and pessimistic formula for calculation of total 
5 flip-flops (T) in coefficient block in "The Existing Method and Minimisation" is [= 
average size of coefficient * number of coefficient] ("Figure 8"), where average size of 
coefficient is calculated pessimistically as (Maximum coefficient size IT). While in 
"Minimisation" and "Generalised structure", the number of [T] elements are [-maximum 
size of coefficient, since the flip flops (T) are share-able here]. 

10 ■» 

2) The appropriate formula for calculation of total adders (SA) in coefficient block of 
the mentioned above cases is [=adders per coefficient * number of coefficient]. Adders per 
coefficient block solely depend on value of coefficient. Assuming no optimization in 
worst case, number of adders per coefficient is (=number of coefficient * maximum 

1 5 coefficient size IT). 

Now using the mentioned formula on an example filter having 20 coefficient. The 
coefficient having maximum value is in 16 bits (e.g. maximum coefficient value is +32767 
or -32768 in 2's complement representation). In the present example, average size of the 

20 coefficient approximated by the formula is S bit. For "The Existing Method and 
... Minimisation", total number of flip-flop (T) required for implementation is 8*20=160. In 
contrast to this, "Minimization (Already applied as patent )" and "Proposed Minimization 
(Proposal for Patent)" would require only 16 flip-flops (The number of flip-flops of all the 
coefficients are share-able and are limited to the coefficient which has the maximum 

25 value). Using the 
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formula for adder's calculation, the number of adders for three cases are 8 * 20 = 160 
(approx). 

Area calculation for "The Existing iMethod and Minimization" is 160 T +160 SA = 480 
units. Axea calculation for Figures 9 and 10 is 16 T + 160 SA = 336. Area calculation for 
Figures 11 and 12 "Proposed Minimization" is 16 T+160 FA +(extra elements 8T+7 FA) = 
191. [Assuming that average number of full adder per bit position is 8. We will generalize 
the calculation of number of extra elements here. These extra elements ai* needed to 
terminate the carry-out of last (LSB) position. Thus if the average number of FA's is 8, the 
extra elements (7 FA, 8T) are needed to terminate the carry-outs' of LSB position This is 
shown in "Figure 12"]. Thus we see that current proposal has an area improvement of 
approximately 60% {(480-190)7480} of coefficient block over "The Existing Method and 
Minimization". 

15 Hardware reduction in block [D] 

For hardware reduction in block [D], following minimization are applied. 

1) sharing of common adder term and using it in block [D] 

2) using subtraction instead of addition when the coefficient is close to power of 2 e.g. 
63 is better realized as (64-1) than (32+16+8+2+1) 

20 3) ^Taking common subtraction operation and maximizing the use of adder 
••■ , For approximate area calculation following assumption is made (1 Unit of Area = 1 FA = 
2 HA = 1 T&SA=SS=2 Units of Area). 



25 



Advantages involved in the present invention 

The Area gets reduced by 50-75% (of the coefficient block [A]) for big filter structures, if 
all the 3 optimization steps, as discussed in previous section "Hardware reduction in block 
[D]", are applied. 
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Claims: 

1. A filter device comprising logic architecture [A] connected to coefficient lines 

CLin_0, CLin_l CLin_n and/or BLin_0, BLin_l, BLinn coming from delay 

blocks [E] and/or [F], v/herein SI, S2,....Sn are serial input bit lines of said architecture 
5 [A], n representing a number of coefficients of the filter transfer function; 

wherein said architecture [A] includes m combinational logic blocks [B] consisting 
of full adder (FA) and full subtractor (FS) elements and the blocks [B] provide the addition 
terms of the filter transfer function [(aO*SI+bO*S2+...+k)*Sn) 
(Al*Sl+bl*S2+...+kl*Sn)...(am*Sl+bm*S2+.:.+km*Sn)J, where aO, b0...kO," al' 
10 bl .. .kl, am, bm, ...km are (+/-1 or 0); 

wherein the connection of the FA and/or FS elements to the SI, S2... Sn lines and 
interconnection of the FA and FS elements depend on the value of coefficients a0...knv 

wherein the final output of a last FA or FS element of each block [B] is terminated 
through lines b_l, b_2...b_m at a plurality of delay elements [T] within a logic block [CJ, 
1 5 the number of delay elements [T] depending on the size of a maximum coefficient value of 
the filter transfer function and being shareable by blocks [B]; 

wherein in said architecture [A] the blocks [B] are clustered together in series to 
form block [D] and delay elements [T] are clustered together in block [C], thereby spatially 
separating the sequential and combinational logic blocks; 
20 "wherein in said architecture [A], the output of each delay element [T] is connected 

... to one of the inputs of one of the FA or FS elements of a respective block [B] 
corresponding to a next bit position; 

wherein the interconnections between blocks [C] and [B] are represented as t_l, 
t_2, ...t_m, the FA and/or FS elements are arranged in matrix form such that elements 
25 FA0_0 to FA0_n correspond to bit position 0, elements FA1_1 to FAl_n correspond to bit 
position 1, and elements FAm_l to FAm_n correspond to bit position m; 

wherein the carry-out pin of each FA or FS element of each block [B] is fed to the 
input of a FA or FS element of a previous block [B] such that the same delay element [T] 
(Tl, T2, T3...Tn) is used for multiplication by a factor of two and also for the carry 
30 structure in a one bit serial adder function; 

wherein in the said architecture [A] logic components represented as block [Ex] are 
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used for connecting the caxryout of the FA and FS elements in a last stage of block [D] and 
a plurality of FA and/or FS elements and delay elements [TJ are used within block [Ex]; 

whereby said architecture [A] is structured into sequential block [C] consisting of 
delay elements and combinational block [D] consisting of FA and/or FS elements, while 
the delay elements [TJ of block [C] are each positioned at a respective end position of each 
block [B], block [D] includes combinational element blocks [B] which essentially include 
FA and/or FS elements, thereby forming shareable logic elements within block [D]; and 
a final output of the architecture [A] is a bit in the position. 

2. The device as claimed in claim 1, wherein when operated in bit serial fashion, the 
device provides hardware minimiza tion of a finite impulse response (FIR) filter* a infinite 
impulse response filter (IIR) or for other filters and application related to combinational 
logic consisting of delay elements [T], multiplier elements [M] and FA and/or FS 
elements. 

3. The device as claimed in claim 1 wherein block [D] uses an FS element instead of 
an FA element when a filter transfer function coefficient value is close to a power of two. 

4. The device as claimed in claim 1 wherein block [D] minimises the use of FS 
20 elements by using a common subtraction operator and substituting FA elements instead. 

5. The device as claimed in any one of the preceding claims wherein coefficient lines 
CLinJ)... CLin_n are not derived from a common input line but axe instead respectively 
delayed by 0. ...n unit delays prior to input into said architecture [A]. 

25 

6. The device as claimed in any preceding claim, wherein instead of FA and FS 
elements, sequential adder (SA) and sequential subtractor (SS) elements are used 

7. A bit serial FIR filter device incLuding: 

30 a logic block [A] adapted to receive an (m+l)-bit input and to produce a transfer 

function output corresponding to the m^ bit position, block [A] including: 
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a combinational-sequential logic block [D] adapted to receive n filter transfer 
function coefficients SI, S2... Sn of a predetermined transfer function and including m+1 
combinational logic blocks BO, Bl,..,.Bm; and 

a sequential logic block [C] having m delay elements Tl, T2,....Tm.for receiving 

respective outputs of blocks BO, Bl, Bm-1 and for providing delayed outputs to 

respective blocks B 1 , B2, . . .Bm; 

wherein each block Bx, for x=0,l....m, includes a plurality of serial subtractor or 
adder elements [SA], up to a maximum of n, SAxJ, SAx_2....SAx_n, for providing .3 
coefficient multiplication function for each block Bx; and 

wherein block Bm outputs said transfer function output according to saicl transfer 
function, based on said (nvH)-bit input. 
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FIGURE 9. Minimization 
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FIGURE 10. Generalized structure 
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